Overview

Brought to you by YData

Dataset statistics

Number of variables25
Number of observations10227
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory2.0 MiB
Average record size in memory208.0 B

Variable types

Categorical14
Text11

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
What programming language would you recommend an aspiring data scientist to learn first? is highly imbalanced (63.7%) Imbalance
Have you ever used a TPU (tensor processing unit)? is highly imbalanced (56.7%) Imbalance

Reproduction

Analysis started2024-11-04 16:43:35.148664
Analysis finished2024-11-04 16:43:44.401435
Duration9.25 seconds
Software versionydata-profiling vv4.12.0
Download configurationconfig.json

Variables

Distinct11
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
25-29
2523 
30-34
2064 
35-39
1420 
22-24
1306 
40-44
969 
Other values (6)
1945 

Length

Max length5
Median length5
Mean length4.9906131
Min length3

Characters and Unicode

Total characters51039
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row22-24
2nd row40-44
3rd row22-24
4th row50-54
5th row22-24

Common Values

ValueCountFrequency (%)
25-29 2523
24.7%
30-34 2064
20.2%
35-39 1420
13.9%
22-24 1306
12.8%
40-44 969
 
9.5%
45-49 642
 
6.3%
50-54 464
 
4.5%
18-21 317
 
3.1%
55-59 264
 
2.6%
60-69 210
 
2.1%

Length

2024-11-05T00:43:44.689436image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
25-29 2523
24.7%
30-34 2064
20.2%
35-39 1420
13.9%
22-24 1306
12.8%
40-44 969
 
9.5%
45-49 642
 
6.3%
50-54 464
 
4.5%
18-21 317
 
3.1%
55-59 264
 
2.6%
60-69 210
 
2.1%

Most occurring characters

ValueCountFrequency (%)
- 10179
19.9%
2 9281
18.2%
4 8025
15.7%
3 6968
13.7%
5 6305
12.4%
9 5059
9.9%
0 3755
 
7.4%
1 634
 
1.2%
6 420
 
0.8%
8 317
 
0.6%
Other values (2) 96
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 51039
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
- 10179
19.9%
2 9281
18.2%
4 8025
15.7%
3 6968
13.7%
5 6305
12.4%
9 5059
9.9%
0 3755
 
7.4%
1 634
 
1.2%
6 420
 
0.8%
8 317
 
0.6%
Other values (2) 96
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 51039
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
- 10179
19.9%
2 9281
18.2%
4 8025
15.7%
3 6968
13.7%
5 6305
12.4%
9 5059
9.9%
0 3755
 
7.4%
1 634
 
1.2%
6 420
 
0.8%
8 317
 
0.6%
Other values (2) 96
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 51039
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
- 10179
19.9%
2 9281
18.2%
4 8025
15.7%
3 6968
13.7%
5 6305
12.4%
9 5059
9.9%
0 3755
 
7.4%
1 634
 
1.2%
6 420
 
0.8%
8 317
 
0.6%
Other values (2) 96
 
0.2%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
Male
8800 
Female
1427 

Length

Max length6
Median length4
Mean length4.2790652
Min length4

Characters and Unicode

Total characters43762
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Male 8800
86.0%
Female 1427
 
14.0%

Length

2024-11-05T00:43:45.156578image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:43:45.587162image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
male 8800
86.0%
female 1427
 
14.0%

Most occurring characters

ValueCountFrequency (%)
e 11654
26.6%
a 10227
23.4%
l 10227
23.4%
M 8800
20.1%
F 1427
 
3.3%
m 1427
 
3.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 43762
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 11654
26.6%
a 10227
23.4%
l 10227
23.4%
M 8800
20.1%
F 1427
 
3.3%
m 1427
 
3.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 43762
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 11654
26.6%
a 10227
23.4%
l 10227
23.4%
M 8800
20.1%
F 1427
 
3.3%
m 1427
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 43762
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 11654
26.6%
a 10227
23.4%
l 10227
23.4%
M 8800
20.1%
F 1427
 
3.3%
m 1427
 
3.3%
Distinct59
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:43:46.396758image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length52
Median length28
Mean length10.895571
Min length4

Characters and Unicode

Total characters111429
Distinct characters49
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFrance
2nd rowAustralia
3rd rowIndia
4th rowFrance
5th rowIndia
ValueCountFrequency (%)
of 2202
 
12.0%
united 2122
 
11.6%
india 1879
 
10.2%
states 1838
 
10.0%
america 1838
 
10.0%
other 529
 
2.9%
brazil 456
 
2.5%
japan 402
 
2.2%
russia 362
 
2.0%
ireland 320
 
1.7%
Other values (63) 6392
34.9%
2024-11-05T00:43:47.805336image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 13320
 
12.0%
i 10131
 
9.1%
e 9931
 
8.9%
n 8714
 
7.8%
t 8233
 
7.4%
8113
 
7.3%
r 6513
 
5.8%
d 5733
 
5.1%
o 4093
 
3.7%
s 3342
 
3.0%
Other values (39) 33306
29.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 111429
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 13320
 
12.0%
i 10131
 
9.1%
e 9931
 
8.9%
n 8714
 
7.8%
t 8233
 
7.4%
8113
 
7.3%
r 6513
 
5.8%
d 5733
 
5.1%
o 4093
 
3.7%
s 3342
 
3.0%
Other values (39) 33306
29.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 111429
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 13320
 
12.0%
i 10131
 
9.1%
e 9931
 
8.9%
n 8714
 
7.8%
t 8233
 
7.4%
8113
 
7.3%
r 6513
 
5.8%
d 5733
 
5.1%
o 4093
 
3.7%
s 3342
 
3.0%
Other values (39) 33306
29.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 111429
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 13320
 
12.0%
i 10131
 
9.1%
e 9931
 
8.9%
n 8714
 
7.8%
t 8233
 
7.4%
8113
 
7.3%
r 6513
 
5.8%
d 5733
 
5.1%
o 4093
 
3.7%
s 3342
 
3.0%
Other values (39) 33306
29.9%
Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
Master’s degree
4882 
Bachelor’s degree
2680 
Doctoral degree
1798 
Professional degree
 
353
Some college/university study without earning a bachelor’s degree
 
308
Other values (2)
 
206

Length

Max length65
Median length15
Mean length17.436296
Min length15

Characters and Unicode

Total characters178321
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMaster’s degree
2nd rowMaster’s degree
3rd rowBachelor’s degree
4th rowMaster’s degree
5th rowMaster’s degree

Common Values

ValueCountFrequency (%)
Master’s degree 4882
47.7%
Bachelor’s degree 2680
26.2%
Doctoral degree 1798
 
17.6%
Professional degree 353
 
3.5%
Some college/university study without earning a bachelor’s degree 308
 
3.0%
I prefer not to answer 113
 
1.1%
No formal education past high school 93
 
0.9%

Length

2024-11-05T00:43:48.277422image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:43:48.720087image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
degree 10021
43.5%
master’s 4882
21.2%
bachelor’s 2988
 
13.0%
doctoral 1798
 
7.8%
professional 353
 
1.5%
some 308
 
1.3%
college/university 308
 
1.3%
study 308
 
1.3%
without 308
 
1.3%
earning 308
 
1.3%
Other values (12) 1431
 
6.2%

Most occurring characters

ValueCountFrequency (%)
e 40258
22.6%
r 21090
11.8%
s 14373
 
8.1%
12786
 
7.2%
a 11029
 
6.2%
g 10730
 
6.0%
d 10422
 
5.8%
o 8905
 
5.0%
t 8324
 
4.7%
7870
 
4.4%
Other values (21) 32534
18.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 178321
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 40258
22.6%
r 21090
11.8%
s 14373
 
8.1%
12786
 
7.2%
a 11029
 
6.2%
g 10730
 
6.0%
d 10422
 
5.8%
o 8905
 
5.0%
t 8324
 
4.7%
7870
 
4.4%
Other values (21) 32534
18.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 178321
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 40258
22.6%
r 21090
11.8%
s 14373
 
8.1%
12786
 
7.2%
a 11029
 
6.2%
g 10730
 
6.0%
d 10422
 
5.8%
o 8905
 
5.0%
t 8324
 
4.7%
7870
 
4.4%
Other values (21) 32534
18.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 178321
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 40258
22.6%
r 21090
11.8%
s 14373
 
8.1%
12786
 
7.2%
a 11029
 
6.2%
g 10730
 
6.0%
d 10422
 
5.8%
o 8905
 
5.0%
t 8324
 
4.7%
7870
 
4.4%
Other values (21) 32534
18.2%
Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
Data Scientist
3243 
Software Engineer
1842 
Data Analyst
1153 
Other
1118 
Research Scientist
1072 
Other values (5)
1799 

Length

Max length23
Median length18
Mean length14.307324
Min length5

Characters and Unicode

Total characters146321
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSoftware Engineer
2nd rowOther
3rd rowOther
4th rowData Scientist
5th rowData Scientist

Common Values

ValueCountFrequency (%)
Data Scientist 3243
31.7%
Software Engineer 1842
18.0%
Data Analyst 1153
 
11.3%
Other 1118
 
10.9%
Research Scientist 1072
 
10.5%
Product/Project Manager 530
 
5.2%
Business Analyst 509
 
5.0%
Data Engineer 448
 
4.4%
Statistician 203
 
2.0%
DBA/Database Engineer 109
 
1.1%

Length

2024-11-05T00:43:49.243549image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:43:49.776020image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
data 4844
25.3%
scientist 4315
22.6%
engineer 2399
12.5%
software 1842
 
9.6%
analyst 1662
 
8.7%
other 1118
 
5.8%
research 1072
 
5.6%
product/project 530
 
2.8%
manager 530
 
2.8%
business 509
 
2.7%
Other values (2) 312
 
1.6%

Most occurring characters

ValueCountFrequency (%)
t 19874
13.6%
a 16057
11.0%
e 15895
10.9%
i 12147
 
8.3%
n 12017
 
8.2%
8906
 
6.1%
s 8888
 
6.1%
r 8021
 
5.5%
c 6650
 
4.5%
S 6360
 
4.3%
Other values (20) 31506
21.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 146321
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 19874
13.6%
a 16057
11.0%
e 15895
10.9%
i 12147
 
8.3%
n 12017
 
8.2%
8906
 
6.1%
s 8888
 
6.1%
r 8021
 
5.5%
c 6650
 
4.5%
S 6360
 
4.3%
Other values (20) 31506
21.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 146321
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 19874
13.6%
a 16057
11.0%
e 15895
10.9%
i 12147
 
8.3%
n 12017
 
8.2%
8906
 
6.1%
s 8888
 
6.1%
r 8021
 
5.5%
c 6650
 
4.5%
S 6360
 
4.3%
Other values (20) 31506
21.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 146321
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 19874
13.6%
a 16057
11.0%
e 15895
10.9%
i 12147
 
8.3%
n 12017
 
8.2%
8906
 
6.1%
s 8888
 
6.1%
r 8021
 
5.5%
c 6650
 
4.5%
S 6360
 
4.3%
Other values (20) 31506
21.5%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
0-49 employees
2849 
> 10,000 employees
2327 
1000-9,999 employees
2010 
50-249 employees
1687 
250-999 employees
1354 

Length

Max length20
Median length18
Mean length16.816466
Min length14

Characters and Unicode

Total characters171982
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1000-9,999 employees
2nd row> 10,000 employees
3rd row0-49 employees
4th row0-49 employees
5th row50-249 employees

Common Values

ValueCountFrequency (%)
0-49 employees 2849
27.9%
> 10,000 employees 2327
22.8%
1000-9,999 employees 2010
19.7%
50-249 employees 1687
16.5%
250-999 employees 1354
13.2%

Length

2024-11-05T00:43:50.310178image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:43:50.751705image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
employees 10227
44.9%
0-49 2849
 
12.5%
2327
 
10.2%
10,000 2327
 
10.2%
1000-9,999 2010
 
8.8%
50-249 1687
 
7.4%
250-999 1354
 
5.9%

Most occurring characters

ValueCountFrequency (%)
e 30681
17.8%
0 21228
12.3%
9 16638
9.7%
12554
7.3%
o 10227
 
5.9%
s 10227
 
5.9%
y 10227
 
5.9%
l 10227
 
5.9%
p 10227
 
5.9%
m 10227
 
5.9%
Other values (7) 29519
17.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 171982
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 30681
17.8%
0 21228
12.3%
9 16638
9.7%
12554
7.3%
o 10227
 
5.9%
s 10227
 
5.9%
y 10227
 
5.9%
l 10227
 
5.9%
p 10227
 
5.9%
m 10227
 
5.9%
Other values (7) 29519
17.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 171982
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 30681
17.8%
0 21228
12.3%
9 16638
9.7%
12554
7.3%
o 10227
 
5.9%
s 10227
 
5.9%
y 10227
 
5.9%
l 10227
 
5.9%
p 10227
 
5.9%
m 10227
 
5.9%
Other values (7) 29519
17.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 171982
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 30681
17.8%
0 21228
12.3%
9 16638
9.7%
12554
7.3%
o 10227
 
5.9%
s 10227
 
5.9%
y 10227
 
5.9%
l 10227
 
5.9%
p 10227
 
5.9%
m 10227
 
5.9%
Other values (7) 29519
17.2%
Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
20+
2416 
1-2
2306 
3-4
1792 
5-9
1421 
0
1232 
Other values (2)
1060 

Length

Max length5
Median length3
Mean length2.9663635
Min length1

Characters and Unicode

Total characters30337
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row20+
3rd row0
4th row3-4
5th row20+

Common Values

ValueCountFrequency (%)
20+ 2416
23.6%
1-2 2306
22.5%
3-4 1792
17.5%
5-9 1421
13.9%
0 1232
12.0%
10-14 738
 
7.2%
15-19 322
 
3.1%

Length

2024-11-05T00:43:51.197464image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:43:51.693254image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
20 2416
23.6%
1-2 2306
22.5%
3-4 1792
17.5%
5-9 1421
13.9%
0 1232
12.0%
10-14 738
 
7.2%
15-19 322
 
3.1%

Most occurring characters

ValueCountFrequency (%)
- 6579
21.7%
2 4722
15.6%
1 4426
14.6%
0 4386
14.5%
4 2530
 
8.3%
+ 2416
 
8.0%
3 1792
 
5.9%
5 1743
 
5.7%
9 1743
 
5.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 30337
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
- 6579
21.7%
2 4722
15.6%
1 4426
14.6%
0 4386
14.5%
4 2530
 
8.3%
+ 2416
 
8.0%
3 1792
 
5.9%
5 1743
 
5.7%
9 1743
 
5.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 30337
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
- 6579
21.7%
2 4722
15.6%
1 4426
14.6%
0 4386
14.5%
4 2530
 
8.3%
+ 2416
 
8.0%
3 1792
 
5.9%
5 1743
 
5.7%
9 1743
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 30337
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
- 6579
21.7%
2 4722
15.6%
1 4426
14.6%
0 4386
14.5%
4 2530
 
8.3%
+ 2416
 
8.0%
3 1792
 
5.9%
5 1743
 
5.7%
9 1743
 
5.7%
Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
We recently started using ML methods (i.e., models in production for less than 2 years)
2236 
We are exploring ML methods (and may one day put a model into production)
2195 
We have well established ML methods (i.e., models in production for more than 2 years)
2077 
No (we do not use ML methods)
1737 
We use ML methods for generating insights (but do not put working models into production)
1246 

Length

Max length89
Median length86
Mean length68.859294
Min length13

Characters and Unicode

Total characters704224
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowI do not know
2nd rowI do not know
3rd rowNo (we do not use ML methods)
4th rowWe have well established ML methods (i.e., models in production for more than 2 years)
5th rowWe are exploring ML methods (and may one day put a model into production)

Common Values

ValueCountFrequency (%)
We recently started using ML methods (i.e., models in production for less than 2 years) 2236
21.9%
We are exploring ML methods (and may one day put a model into production) 2195
21.5%
We have well established ML methods (i.e., models in production for more than 2 years) 2077
20.3%
No (we do not use ML methods) 1737
17.0%
We use ML methods for generating insights (but do not put working models into production) 1246
12.2%
I do not know 736
 
7.2%

Length

2024-11-05T00:43:52.160733image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:43:52.653785image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
we 9491
 
7.3%
ml 9491
 
7.3%
methods 9491
 
7.3%
production 7754
 
6.0%
models 5559
 
4.3%
for 5559
 
4.3%
years 4313
 
3.3%
i.e 4313
 
3.3%
in 4313
 
3.3%
than 4313
 
3.3%
Other values (29) 64621
50.0%

Most occurring characters

ValueCountFrequency (%)
118991
16.9%
e 66751
 
9.5%
o 59377
 
8.4%
t 44682
 
6.3%
n 40317
 
5.7%
s 37936
 
5.4%
d 37421
 
5.3%
i 31313
 
4.4%
r 31057
 
4.4%
a 27237
 
3.9%
Other values (24) 209142
29.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 704224
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
118991
16.9%
e 66751
 
9.5%
o 59377
 
8.4%
t 44682
 
6.3%
n 40317
 
5.7%
s 37936
 
5.4%
d 37421
 
5.3%
i 31313
 
4.4%
r 31057
 
4.4%
a 27237
 
3.9%
Other values (24) 209142
29.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 704224
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
118991
16.9%
e 66751
 
9.5%
o 59377
 
8.4%
t 44682
 
6.3%
n 40317
 
5.7%
s 37936
 
5.4%
d 37421
 
5.3%
i 31313
 
4.4%
r 31057
 
4.4%
a 27237
 
3.9%
Other values (24) 209142
29.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 704224
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
118991
16.9%
e 66751
 
9.5%
o 59377
 
8.4%
t 44682
 
6.3%
n 40317
 
5.7%
s 37936
 
5.4%
d 37421
 
5.3%
i 31313
 
4.4%
r 31057
 
4.4%
a 27237
 
3.9%
Other values (24) 209142
29.7%
Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
$0-999
1064 
10,000-14,999
685 
100,000-124,999
 
649
30,000-39,999
 
633
40,000-49,999
 
622
Other values (20)
6574 

Length

Max length15
Median length13
Mean length12.215019
Min length6

Characters and Unicode

Total characters124923
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30,000-39,999
2nd row250,000-299,999
3rd row4,000-4,999
4th row60,000-69,999
5th row10,000-14,999

Common Values

ValueCountFrequency (%)
$0-999 1064
 
10.4%
10,000-14,999 685
 
6.7%
100,000-124,999 649
 
6.3%
30,000-39,999 633
 
6.2%
40,000-49,999 622
 
6.1%
50,000-59,999 604
 
5.9%
60,000-69,999 501
 
4.9%
70,000-79,999 458
 
4.5%
15,000-19,999 452
 
4.4%
20,000-24,999 438
 
4.3%
Other values (15) 4121
40.3%

Length

2024-11-05T00:43:53.125391image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0-999 1064
 
10.4%
10,000-14,999 685
 
6.7%
100,000-124,999 649
 
6.3%
30,000-39,999 633
 
6.2%
40,000-49,999 622
 
6.1%
50,000-59,999 604
 
5.9%
60,000-69,999 501
 
4.9%
70,000-79,999 458
 
4.5%
15,000-19,999 452
 
4.4%
20,000-24,999 438
 
4.3%
Other values (16) 4169
40.6%

Most occurring characters

ValueCountFrequency (%)
9 36688
29.4%
0 35359
28.3%
, 18278
14.6%
- 10179
 
8.1%
1 6036
 
4.8%
4 4480
 
3.6%
5 3785
 
3.0%
2 3758
 
3.0%
3 1796
 
1.4%
7 1656
 
1.3%
Other values (5) 2908
 
2.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 124923
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
9 36688
29.4%
0 35359
28.3%
, 18278
14.6%
- 10179
 
8.1%
1 6036
 
4.8%
4 4480
 
3.6%
5 3785
 
3.0%
2 3758
 
3.0%
3 1796
 
1.4%
7 1656
 
1.3%
Other values (5) 2908
 
2.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 124923
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
9 36688
29.4%
0 35359
28.3%
, 18278
14.6%
- 10179
 
8.1%
1 6036
 
4.8%
4 4480
 
3.6%
5 3785
 
3.0%
2 3758
 
3.0%
3 1796
 
1.4%
7 1656
 
1.3%
Other values (5) 2908
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 124923
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
9 36688
29.4%
0 35359
28.3%
, 18278
14.6%
- 10179
 
8.1%
1 6036
 
4.8%
4 4480
 
3.6%
5 3785
 
3.0%
2 3758
 
3.0%
3 1796
 
1.4%
7 1656
 
1.3%
Other values (5) 2908
 
2.3%
Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
$0 (USD)
3161 
$100-$999
1995 
$1000-$9,999
1859 
$1-$99
1215 
$10,000-$99,999
1128 

Length

Max length17
Median length15
Mean length10.221375
Min length6

Characters and Unicode

Total characters104534
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row$0 (USD)
2nd row$10,000-$99,999
3rd row$0 (USD)
4th row$10,000-$99,999
5th row$100-$999

Common Values

ValueCountFrequency (%)
$0 (USD) 3161
30.9%
$100-$999 1995
19.5%
$1000-$9,999 1859
18.2%
$1-$99 1215
 
11.9%
$10,000-$99,999 1128
 
11.0%
> $100,000 ($USD) 869
 
8.5%

Length

2024-11-05T00:43:53.477417image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:43:54.010094image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
usd 4030
26.6%
0 3161
20.9%
100-$999 1995
13.2%
1000-$9,999 1859
12.3%
1-$99 1215
 
8.0%
10,000-$99,999 1128
 
7.5%
869
 
5.7%
100,000 869
 
5.7%

Most occurring characters

ValueCountFrequency (%)
0 21585
20.6%
9 21491
20.6%
$ 17293
16.5%
1 7066
 
6.8%
- 6197
 
5.9%
, 4984
 
4.8%
4899
 
4.7%
( 4030
 
3.9%
U 4030
 
3.9%
S 4030
 
3.9%
Other values (3) 8929
8.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 104534
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 21585
20.6%
9 21491
20.6%
$ 17293
16.5%
1 7066
 
6.8%
- 6197
 
5.9%
, 4984
 
4.8%
4899
 
4.7%
( 4030
 
3.9%
U 4030
 
3.9%
S 4030
 
3.9%
Other values (3) 8929
8.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 104534
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 21585
20.6%
9 21491
20.6%
$ 17293
16.5%
1 7066
 
6.8%
- 6197
 
5.9%
, 4984
 
4.8%
4899
 
4.7%
( 4030
 
3.9%
U 4030
 
3.9%
S 4030
 
3.9%
Other values (3) 8929
8.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 104534
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 21585
20.6%
9 21491
20.6%
$ 17293
16.5%
1 7066
 
6.8%
- 6197
 
5.9%
, 4984
 
4.8%
4899
 
4.7%
( 4030
 
3.9%
U 4030
 
3.9%
S 4030
 
3.9%
Other values (3) 8929
8.5%
Distinct3469
Distinct (%)33.9%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:43:55.262194image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length89
Median length86
Mean length75.376161
Min length25

Characters and Unicode

Total characters770872
Distinct characters55
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3030 ?
Unique (%)29.6%

Sample

1st rowBasic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -1
2nd rowLocal development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -1
3rd rowLocal development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1
4th rowAdvanced statistical software (SPSS, SAS, etc.), -1, 0, -1, -1, -1
5th rowLocal development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2, -1
ValueCountFrequency (%)
1 42435
36.7%
etc 9496
 
8.2%
local 5586
 
4.8%
development 5586
 
4.8%
environments 5586
 
4.8%
rstudio 5586
 
4.8%
jupyterlab 5586
 
4.8%
software 3910
 
3.4%
statistical 2288
 
2.0%
basic 1653
 
1.4%
Other values (2157) 27942
24.2%
2024-11-05T00:43:56.949981image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
105427
 
13.7%
, 71749
 
9.3%
e 62224
 
8.1%
t 48953
 
6.4%
1 45973
 
6.0%
- 42584
 
5.5%
o 35168
 
4.6%
a 26812
 
3.5%
n 25019
 
3.2%
c 24324
 
3.2%
Other values (45) 282639
36.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 770872
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
105427
 
13.7%
, 71749
 
9.3%
e 62224
 
8.1%
t 48953
 
6.4%
1 45973
 
6.0%
- 42584
 
5.5%
o 35168
 
4.6%
a 26812
 
3.5%
n 25019
 
3.2%
c 24324
 
3.2%
Other values (45) 282639
36.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 770872
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
105427
 
13.7%
, 71749
 
9.3%
e 62224
 
8.1%
t 48953
 
6.4%
1 45973
 
6.0%
- 42584
 
5.5%
o 35168
 
4.6%
a 26812
 
3.5%
n 25019
 
3.2%
c 24324
 
3.2%
Other values (45) 282639
36.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 770872
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
105427
 
13.7%
, 71749
 
9.3%
e 62224
 
8.1%
t 48953
 
6.4%
1 45973
 
6.0%
- 42584
 
5.5%
o 35168
 
4.6%
a 26812
 
3.5%
n 25019
 
3.2%
c 24324
 
3.2%
Other values (45) 282639
36.7%
Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
3-5 years
2672 
1-2 years
2542 
< 1 years
1892 
5-10 years
1663 
10-20 years
955 

Length

Max length11
Median length9
Mean length9.3493693
Min length9

Characters and Unicode

Total characters95616
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1-2 years
2nd row1-2 years
3rd row< 1 years
4th row20+ years
5th row3-5 years

Common Values

ValueCountFrequency (%)
3-5 years 2672
26.1%
1-2 years 2542
24.9%
< 1 years 1892
18.5%
5-10 years 1663
16.3%
10-20 years 955
 
9.3%
20+ years 503
 
4.9%

Length

2024-11-05T00:43:57.450607image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:43:57.952064image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
years 10227
45.8%
3-5 2672
 
12.0%
1-2 2542
 
11.4%
1892
 
8.5%
1 1892
 
8.5%
5-10 1663
 
7.4%
10-20 955
 
4.3%
20 503
 
2.3%

Most occurring characters

ValueCountFrequency (%)
12119
12.7%
y 10227
10.7%
e 10227
10.7%
a 10227
10.7%
r 10227
10.7%
s 10227
10.7%
- 7832
8.2%
1 7052
7.4%
5 4335
 
4.5%
0 4076
 
4.3%
Other values (4) 9067
9.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 95616
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
12119
12.7%
y 10227
10.7%
e 10227
10.7%
a 10227
10.7%
r 10227
10.7%
s 10227
10.7%
- 7832
8.2%
1 7052
7.4%
5 4335
 
4.5%
0 4076
 
4.3%
Other values (4) 9067
9.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 95616
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
12119
12.7%
y 10227
10.7%
e 10227
10.7%
a 10227
10.7%
r 10227
10.7%
s 10227
10.7%
- 7832
8.2%
1 7052
7.4%
5 4335
 
4.5%
0 4076
 
4.3%
Other values (4) 9067
9.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 95616
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
12119
12.7%
y 10227
10.7%
e 10227
10.7%
a 10227
10.7%
r 10227
10.7%
s 10227
10.7%
- 7832
8.2%
1 7052
7.4%
5 4335
 
4.5%
0 4076
 
4.3%
Other values (4) 9067
9.5%
Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
Python
7880 
R
1036 
SQL
 
711
C++
 
122
MATLAB
 
110
Other values (7)
 
368

Length

Max length10
Median length6
Mean length5.186565
Min length1

Characters and Unicode

Total characters53043
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPython
2nd rowPython
3rd rowPython
4th rowJava
5th rowPython

Common Values

ValueCountFrequency (%)
Python 7880
77.1%
R 1036
 
10.1%
SQL 711
 
7.0%
C++ 122
 
1.2%
MATLAB 110
 
1.1%
Other 102
 
1.0%
C 80
 
0.8%
Java 64
 
0.6%
None 56
 
0.5%
Javascript 34
 
0.3%
Other values (2) 32
 
0.3%

Length

2024-11-05T00:43:58.414748image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
python 7880
77.1%
r 1036
 
10.1%
sql 711
 
7.0%
c 202
 
2.0%
matlab 110
 
1.1%
other 102
 
1.0%
java 64
 
0.6%
none 56
 
0.5%
javascript 34
 
0.3%
bash 27
 
0.3%

Most occurring characters

ValueCountFrequency (%)
t 8021
15.1%
h 8009
15.1%
o 7936
15.0%
n 7936
15.0%
y 7885
14.9%
P 7880
14.9%
R 1036
 
2.0%
L 821
 
1.5%
S 716
 
1.3%
Q 711
 
1.3%
Other values (17) 2092
 
3.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 53043
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 8021
15.1%
h 8009
15.1%
o 7936
15.0%
n 7936
15.0%
y 7885
14.9%
P 7880
14.9%
R 1036
 
2.0%
L 821
 
1.5%
S 716
 
1.3%
Q 711
 
1.3%
Other values (17) 2092
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 53043
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 8021
15.1%
h 8009
15.1%
o 7936
15.0%
n 7936
15.0%
y 7885
14.9%
P 7880
14.9%
R 1036
 
2.0%
L 821
 
1.5%
S 716
 
1.3%
Q 711
 
1.3%
Other values (17) 2092
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 53043
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 8021
15.1%
h 8009
15.1%
o 7936
15.0%
n 7936
15.0%
y 7885
14.9%
P 7880
14.9%
R 1036
 
2.0%
L 821
 
1.5%
S 716
 
1.3%
Q 711
 
1.3%
Other values (17) 2092
 
3.9%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
Never
8252 
Once
956 
2-5 times
 
768
6-24 times
 
134
> 25 times
 
117

Length

Max length10
Median length5
Mean length5.3296177
Min length4

Characters and Unicode

Total characters54506
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNever
2nd rowOnce
3rd rowNever
4th rowNever
5th row6-24 times

Common Values

ValueCountFrequency (%)
Never 8252
80.7%
Once 956
 
9.3%
2-5 times 768
 
7.5%
6-24 times 134
 
1.3%
> 25 times 117
 
1.1%

Length

2024-11-05T00:43:58.753671image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:43:59.177658image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
never 8252
72.6%
times 1019
 
9.0%
once 956
 
8.4%
2-5 768
 
6.8%
6-24 134
 
1.2%
117
 
1.0%
25 117
 
1.0%

Most occurring characters

ValueCountFrequency (%)
e 18479
33.9%
N 8252
15.1%
v 8252
15.1%
r 8252
15.1%
1136
 
2.1%
s 1019
 
1.9%
2 1019
 
1.9%
t 1019
 
1.9%
i 1019
 
1.9%
m 1019
 
1.9%
Other values (8) 5040
 
9.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 54506
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 18479
33.9%
N 8252
15.1%
v 8252
15.1%
r 8252
15.1%
1136
 
2.1%
s 1019
 
1.9%
2 1019
 
1.9%
t 1019
 
1.9%
i 1019
 
1.9%
m 1019
 
1.9%
Other values (8) 5040
 
9.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 54506
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 18479
33.9%
N 8252
15.1%
v 8252
15.1%
r 8252
15.1%
1136
 
2.1%
s 1019
 
1.9%
2 1019
 
1.9%
t 1019
 
1.9%
i 1019
 
1.9%
m 1019
 
1.9%
Other values (8) 5040
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 54506
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 18479
33.9%
N 8252
15.1%
v 8252
15.1%
r 8252
15.1%
1136
 
2.1%
s 1019
 
1.9%
2 1019
 
1.9%
t 1019
 
1.9%
i 1019
 
1.9%
m 1019
 
1.9%
Other values (8) 5040
 
9.2%
Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
< 1 years
2966 
1-2 years
2641 
2-3 years
1526 
3-4 years
946 
4-5 years
850 
Other values (3)
1298 

Length

Max length11
Median length9
Mean length9.1413904
Min length9

Characters and Unicode

Total characters93489
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1-2 years
2nd row2-3 years
3rd row< 1 years
4th row10-15 years
5th row2-3 years

Common Values

ValueCountFrequency (%)
< 1 years 2966
29.0%
1-2 years 2641
25.8%
2-3 years 1526
14.9%
3-4 years 946
 
9.3%
4-5 years 850
 
8.3%
5-10 years 808
 
7.9%
10-15 years 319
 
3.1%
20+ years 171
 
1.7%

Length

2024-11-05T00:44:04.035232image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T00:44:04.510722image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
years 10227
43.7%
2966
 
12.7%
1 2966
 
12.7%
1-2 2641
 
11.3%
2-3 1526
 
6.5%
3-4 946
 
4.0%
4-5 850
 
3.6%
5-10 808
 
3.5%
10-15 319
 
1.4%
20 171
 
0.7%

Most occurring characters

ValueCountFrequency (%)
13193
14.1%
y 10227
10.9%
e 10227
10.9%
a 10227
10.9%
r 10227
10.9%
s 10227
10.9%
- 7090
7.6%
1 7053
7.5%
2 4338
 
4.6%
< 2966
 
3.2%
Other values (5) 7714
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 93489
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
13193
14.1%
y 10227
10.9%
e 10227
10.9%
a 10227
10.9%
r 10227
10.9%
s 10227
10.9%
- 7090
7.6%
1 7053
7.5%
2 4338
 
4.6%
< 2966
 
3.2%
Other values (5) 7714
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 93489
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
13193
14.1%
y 10227
10.9%
e 10227
10.9%
a 10227
10.9%
r 10227
10.9%
s 10227
10.9%
- 7090
7.6%
1 7053
7.5%
2 4338
 
4.6%
< 2966
 
3.2%
Other values (5) 7714
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 93489
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
13193
14.1%
y 10227
10.9%
e 10227
10.9%
a 10227
10.9%
r 10227
10.9%
s 10227
10.9%
- 7090
7.6%
1 7053
7.5%
2 4338
 
4.6%
< 2966
 
3.2%
Other values (5) 7714
8.3%
Distinct902
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:44:05.238941image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length505
Median length395
Mean length163.61435
Min length4

Characters and Unicode

Total characters1673284
Distinct characters49
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique304 ?
Unique (%)3.0%

Sample

1st rowTwitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)
2nd rowPodcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc)
3rd rowYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Other
4th rowYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)
5th rowKaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Journal Publications (traditional publications, preprint journals, etc)
ValueCountFrequency (%)
etc 28447
 
13.9%
data 10481
 
5.1%
science 10481
 
5.1%
forums 9222
 
4.5%
kaggle 6783
 
3.3%
blog 6783
 
3.3%
social 6783
 
3.3%
media 6783
 
3.3%
kdnuggets 6550
 
3.2%
vidhya 6550
 
3.2%
Other values (36) 105472
51.6%
2024-11-05T00:44:06.367297image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
194108
 
11.6%
e 124958
 
7.5%
a 117653
 
7.0%
i 99034
 
5.9%
t 90903
 
5.4%
, 90663
 
5.4%
s 86036
 
5.1%
c 84486
 
5.0%
o 78256
 
4.7%
n 68808
 
4.1%
Other values (39) 638379
38.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1673284
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
194108
 
11.6%
e 124958
 
7.5%
a 117653
 
7.0%
i 99034
 
5.9%
t 90903
 
5.4%
, 90663
 
5.4%
s 86036
 
5.1%
c 84486
 
5.0%
o 78256
 
4.7%
n 68808
 
4.1%
Other values (39) 638379
38.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1673284
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
194108
 
11.6%
e 124958
 
7.5%
a 117653
 
7.0%
i 99034
 
5.9%
t 90903
 
5.4%
, 90663
 
5.4%
s 86036
 
5.1%
c 84486
 
5.0%
o 78256
 
4.7%
n 68808
 
4.1%
Other values (39) 638379
38.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1673284
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
194108
 
11.6%
e 124958
 
7.5%
a 117653
 
7.0%
i 99034
 
5.9%
t 90903
 
5.4%
, 90663
 
5.4%
s 86036
 
5.1%
c 84486
 
5.0%
o 78256
 
4.7%
n 68808
 
4.1%
Other values (39) 638379
38.2%
Distinct713
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:44:06.875087image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length169
Median length148
Mean length41.713601
Min length3

Characters and Unicode

Total characters426605
Distinct characters35
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique244 ?
Unique (%)2.4%

Sample

1st rowCoursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy
2nd rowCoursera, edX, DataCamp, University Courses (resulting in a university degree)
3rd rowOther
4th rowNone
5th rowUdacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), Udemy
ValueCountFrequency (%)
kaggle 6470
11.5%
courses 5999
10.6%
coursera 5810
 
10.3%
university 5528
 
9.8%
learn 3235
 
5.7%
i.e 3235
 
5.7%
udemy 3115
 
5.5%
a 2764
 
4.9%
degree 2764
 
4.9%
resulting 2764
 
4.9%
Other values (10) 14685
26.1%
2024-11-05T00:44:08.124744image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 50571
 
11.9%
46142
 
10.8%
r 33821
 
7.9%
a 32264
 
7.6%
s 27657
 
6.5%
i 24626
 
5.8%
g 19304
 
4.5%
n 18367
 
4.3%
u 17805
 
4.2%
t 16101
 
3.8%
Other values (25) 139947
32.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 426605
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 50571
 
11.9%
46142
 
10.8%
r 33821
 
7.9%
a 32264
 
7.6%
s 27657
 
6.5%
i 24626
 
5.8%
g 19304
 
4.5%
n 18367
 
4.3%
u 17805
 
4.2%
t 16101
 
3.8%
Other values (25) 139947
32.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 426605
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 50571
 
11.9%
46142
 
10.8%
r 33821
 
7.9%
a 32264
 
7.6%
s 27657
 
6.5%
i 24626
 
5.8%
g 19304
 
4.5%
n 18367
 
4.3%
u 17805
 
4.2%
t 16101
 
3.8%
Other values (25) 139947
32.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 426605
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 50571
 
11.9%
46142
 
10.8%
r 33821
 
7.9%
a 32264
 
7.6%
s 27657
 
6.5%
i 24626
 
5.8%
g 19304
 
4.5%
n 18367
 
4.3%
u 17805
 
4.2%
t 16101
 
3.8%
Other values (25) 139947
32.8%
Distinct752
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:44:08.724410image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length185
Median length158
Mean length65.056322
Min length4

Characters and Unicode

Total characters665331
Distinct characters39
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique260 ?
Unique (%)2.5%

Sample

1st rowJupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , Spyder
2nd rowJupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code
3rd rowJupyter (JupyterLab, Jupyter Notebooks, etc)
4th row RStudio , Other
5th rowJupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime Text
ValueCountFrequency (%)
21927
23.0%
jupyter 14972
15.7%
notebooks 7486
 
7.8%
etc 7486
 
7.8%
jupyterlab 7486
 
7.8%
visual 6468
 
6.8%
studio 6468
 
6.8%
rstudio 3334
 
3.5%
code 3234
 
3.4%
pycharm 2999
 
3.1%
Other values (10) 13680
14.3%
2024-11-05T00:44:09.889765image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
129730
19.5%
t 53060
 
8.0%
e 49519
 
7.4%
u 40541
 
6.1%
o 39151
 
5.9%
, 32273
 
4.9%
r 28028
 
4.2%
y 27509
 
4.1%
p 27007
 
4.1%
J 22458
 
3.4%
Other values (29) 216055
32.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 665331
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
129730
19.5%
t 53060
 
8.0%
e 49519
 
7.4%
u 40541
 
6.1%
o 39151
 
5.9%
, 32273
 
4.9%
r 28028
 
4.2%
y 27509
 
4.1%
p 27007
 
4.1%
J 22458
 
3.4%
Other values (29) 216055
32.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 665331
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
129730
19.5%
t 53060
 
8.0%
e 49519
 
7.4%
u 40541
 
6.1%
o 39151
 
5.9%
, 32273
 
4.9%
r 28028
 
4.2%
y 27509
 
4.1%
p 27007
 
4.1%
J 22458
 
3.4%
Other values (29) 216055
32.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 665331
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
129730
19.5%
t 53060
 
8.0%
e 49519
 
7.4%
u 40541
 
6.1%
o 39151
 
5.9%
, 32273
 
4.9%
r 28028
 
4.2%
y 27509
 
4.1%
p 27007
 
4.1%
J 22458
 
3.4%
Other values (29) 216055
32.5%
Distinct228
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:44:10.567696image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length295
Median length254
Mean length29.367948
Min length4

Characters and Unicode

Total characters300346
Distinct characters44
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)0.9%

Sample

1st rowNone
2nd row Microsoft Azure Notebooks
3rd row Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc)
4th rowNone
5th row Kaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHub
ValueCountFrequency (%)
5411
12.7%
notebooks 5161
12.1%
none 3870
 
9.1%
google 3771
 
8.8%
kernels 3225
 
7.5%
kaggle 3225
 
7.5%
colab 2982
 
7.0%
notebook 1427
 
3.3%
products 1427
 
3.3%
etc 1427
 
3.3%
Other values (20) 10816
25.3%
2024-11-05T00:44:11.813784image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
47961
16.0%
o 39506
13.2%
e 30394
 
10.1%
l 15641
 
5.2%
t 14204
 
4.7%
b 11585
 
3.9%
a 11491
 
3.8%
s 11037
 
3.7%
g 10859
 
3.6%
N 10458
 
3.5%
Other values (34) 97210
32.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 300346
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
47961
16.0%
o 39506
13.2%
e 30394
 
10.1%
l 15641
 
5.2%
t 14204
 
4.7%
b 11585
 
3.9%
a 11491
 
3.8%
s 11037
 
3.7%
g 10859
 
3.6%
N 10458
 
3.5%
Other values (34) 97210
32.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 300346
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
47961
16.0%
o 39506
13.2%
e 30394
 
10.1%
l 15641
 
5.2%
t 14204
 
4.7%
b 11585
 
3.9%
a 11491
 
3.8%
s 11037
 
3.7%
g 10859
 
3.6%
N 10458
 
3.5%
Other values (34) 97210
32.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 300346
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
47961
16.0%
o 39506
13.2%
e 30394
 
10.1%
l 15641
 
5.2%
t 14204
 
4.7%
b 11585
 
3.9%
a 11491
 
3.8%
s 11037
 
3.7%
g 10859
 
3.6%
N 10458
 
3.5%
Other values (34) 97210
32.4%
Distinct542
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:44:12.324259image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length70
Median length58
Mean length15.099736
Min length1

Characters and Unicode

Total characters154425
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique181 ?
Unique (%)1.8%

Sample

1st rowPython, R, SQL, Java, Javascript, MATLAB
2nd rowPython, R, SQL, Bash
3rd rowPython, SQL
4th rowPython, R
5th rowPython, R, Bash
ValueCountFrequency (%)
python 9016
33.4%
sql 5218
19.3%
r 3514
 
13.0%
c 2185
 
8.1%
bash 1685
 
6.2%
javascript 1617
 
6.0%
java 1501
 
5.6%
other 966
 
3.6%
matlab 922
 
3.4%
typescript 331
 
1.2%
2024-11-05T00:44:13.407784image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
, 16788
 
10.9%
16788
 
10.9%
t 11930
 
7.7%
h 11667
 
7.6%
y 9347
 
6.1%
o 9076
 
5.9%
n 9076
 
5.9%
P 9016
 
5.8%
a 7921
 
5.1%
L 6140
 
4.0%
Other values (19) 46676
30.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 154425
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
, 16788
 
10.9%
16788
 
10.9%
t 11930
 
7.7%
h 11667
 
7.6%
y 9347
 
6.1%
o 9076
 
5.9%
n 9076
 
5.9%
P 9016
 
5.8%
a 7921
 
5.1%
L 6140
 
4.0%
Other values (19) 46676
30.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 154425
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
, 16788
 
10.9%
16788
 
10.9%
t 11930
 
7.7%
h 11667
 
7.6%
y 9347
 
6.1%
o 9076
 
5.9%
n 9076
 
5.9%
P 9016
 
5.8%
a 7921
 
5.1%
L 6140
 
4.0%
Other values (19) 46676
30.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 154425
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
, 16788
 
10.9%
16788
 
10.9%
t 11930
 
7.7%
h 11667
 
7.6%
y 9347
 
6.1%
o 9076
 
5.9%
n 9076
 
5.9%
P 9016
 
5.8%
a 7921
 
5.1%
L 6140
 
4.0%
Other values (19) 46676
30.2%
Distinct412
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:44:13.861570image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length141
Median length123
Mean length30.944656
Min length4

Characters and Unicode

Total characters316471
Distinct characters38
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique158 ?
Unique (%)1.5%

Sample

1st row Matplotlib
2nd row Ggplot / ggplot2 , Matplotlib , Seaborn
3rd row Matplotlib , Plotly / Plotly Express , Seaborn
4th row Ggplot / ggplot2
5th row Matplotlib , Plotly / Plotly Express , Bokeh , Seaborn
ValueCountFrequency (%)
18890
37.4%
matplotlib 7288
 
14.4%
plotly 4994
 
9.9%
seaborn 4862
 
9.6%
ggplot 3177
 
6.3%
ggplot2 3177
 
6.3%
express 2497
 
4.9%
shiny 1079
 
2.1%
none 915
 
1.8%
d3.js 903
 
1.8%
Other values (6) 2722
 
5.4%
2024-11-05T00:44:14.882491image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
70917
22.4%
l 32894
 
10.4%
t 27349
 
8.6%
o 26638
 
8.4%
p 16603
 
5.2%
, 12758
 
4.0%
a 12740
 
4.0%
b 12614
 
4.0%
e 10864
 
3.4%
g 9531
 
3.0%
Other values (28) 83563
26.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 316471
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
70917
22.4%
l 32894
 
10.4%
t 27349
 
8.6%
o 26638
 
8.4%
p 16603
 
5.2%
, 12758
 
4.0%
a 12740
 
4.0%
b 12614
 
4.0%
e 10864
 
3.4%
g 9531
 
3.0%
Other values (28) 83563
26.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 316471
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
70917
22.4%
l 32894
 
10.4%
t 27349
 
8.6%
o 26638
 
8.4%
p 16603
 
5.2%
, 12758
 
4.0%
a 12740
 
4.0%
b 12614
 
4.0%
e 10864
 
3.4%
g 9531
 
3.0%
Other values (28) 83563
26.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 316471
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
70917
22.4%
l 32894
 
10.4%
t 27349
 
8.6%
o 26638
 
8.4%
p 16603
 
5.2%
, 12758
 
4.0%
a 12740
 
4.0%
b 12614
 
4.0%
e 10864
 
3.4%
g 9531
 
3.0%
Other values (28) 83563
26.4%
Distinct14
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
CPUs
3687 
CPUs, GPUs
3680 
None / I do not know
1683 
GPUs
751 
CPUs, GPUs, TPUs
 
248
Other values (9)
 
178

Length

Max length23
Median length20
Mean length9.1786448
Min length4

Characters and Unicode

Total characters93870
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowCPUs, GPUs
2nd rowCPUs, GPUs
3rd rowCPUs, GPUs
4th rowCPUs, GPUs
5th rowCPUs, GPUs

Common Values

ValueCountFrequency (%)
CPUs 3687
36.1%
CPUs, GPUs 3680
36.0%
None / I do not know 1683
16.5%
GPUs 751
 
7.3%
CPUs, GPUs, TPUs 248
 
2.4%
GPUs, TPUs 50
 
0.5%
Other 40
 
0.4%
CPUs, TPUs 23
 
0.2%
CPUs, GPUs, Other 21
 
0.2%
TPUs 21
 
0.2%
Other values (4) 23
 
0.2%

Length

2024-11-05T00:44:15.340582image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cpus 7677
33.4%
gpus 4760
20.7%
none 1683
 
7.3%
1683
 
7.3%
i 1683
 
7.3%
do 1683
 
7.3%
not 1683
 
7.3%
know 1683
 
7.3%
tpus 348
 
1.5%
other 84
 
0.4%

Most occurring characters

ValueCountFrequency (%)
P 12785
13.6%
U 12785
13.6%
s 12785
13.6%
12740
13.6%
C 7677
8.2%
o 6732
7.2%
n 5049
 
5.4%
G 4760
 
5.1%
, 4325
 
4.6%
t 1767
 
1.9%
Other values (11) 12465
13.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 93870
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
P 12785
13.6%
U 12785
13.6%
s 12785
13.6%
12740
13.6%
C 7677
8.2%
o 6732
7.2%
n 5049
 
5.4%
G 4760
 
5.1%
, 4325
 
4.6%
t 1767
 
1.9%
Other values (11) 12465
13.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 93870
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
P 12785
13.6%
U 12785
13.6%
s 12785
13.6%
12740
13.6%
C 7677
8.2%
o 6732
7.2%
n 5049
 
5.4%
G 4760
 
5.1%
, 4325
 
4.6%
t 1767
 
1.9%
Other values (11) 12465
13.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 93870
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
P 12785
13.6%
U 12785
13.6%
s 12785
13.6%
12740
13.6%
C 7677
8.2%
o 6732
7.2%
n 5049
 
5.4%
G 4760
 
5.1%
, 4325
 
4.6%
t 1767
 
1.9%
Other values (11) 12465
13.3%
Distinct630
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:44:15.985357image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length336
Median length288
Mean length103.95815
Min length4

Characters and Unicode

Total characters1063180
Distinct characters43
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique230 ?
Unique (%)2.2%

Sample

1st rowLinear or Logistic Regression
2nd rowLinear or Logistic Regression, Convolutional Neural Networks
3rd rowLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc)
4th rowLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks
5th rowLinear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural Networks
ValueCountFrequency (%)
or 13843
 
10.4%
networks 10147
 
7.7%
neural 8736
 
6.6%
linear 7454
 
5.6%
logistic 7454
 
5.6%
regression 7454
 
5.6%
etc 7434
 
5.6%
decision 6389
 
4.8%
trees 6389
 
4.8%
random 6389
 
4.8%
Other values (20) 50827
38.4%
2024-11-05T00:44:17.221597image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
122289
 
11.5%
e 103627
 
9.7%
o 93274
 
8.8%
s 83531
 
7.9%
r 78498
 
7.4%
i 68652
 
6.5%
n 58973
 
5.5%
t 57547
 
5.4%
a 47650
 
4.5%
, 35206
 
3.3%
Other values (33) 313933
29.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1063180
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
122289
 
11.5%
e 103627
 
9.7%
o 93274
 
8.8%
s 83531
 
7.9%
r 78498
 
7.4%
i 68652
 
6.5%
n 58973
 
5.5%
t 57547
 
5.4%
a 47650
 
4.5%
, 35206
 
3.3%
Other values (33) 313933
29.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1063180
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
122289
 
11.5%
e 103627
 
9.7%
o 93274
 
8.8%
s 83531
 
7.9%
r 78498
 
7.4%
i 68652
 
6.5%
n 58973
 
5.5%
t 57547
 
5.4%
a 47650
 
4.5%
, 35206
 
3.3%
Other values (33) 313933
29.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1063180
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
122289
 
11.5%
e 103627
 
9.7%
o 93274
 
8.8%
s 83531
 
7.9%
r 78498
 
7.4%
i 68652
 
6.5%
n 58973
 
5.5%
t 57547
 
5.4%
a 47650
 
4.5%
, 35206
 
3.3%
Other values (33) 313933
29.5%
Distinct92
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:44:18.013775image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length374
Median length4
Mean length46.460057
Min length4

Characters and Unicode

Total characters475147
Distinct characters41
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)0.2%

Sample

1st rowNone
2nd rowAutomation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)
3rd rowNone
4th rowAutomated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)
5th rowAutomated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)
ValueCountFrequency (%)
e.g 7591
 
13.4%
automated 6645
 
11.8%
none 5708
 
10.1%
model 2625
 
4.6%
selection 2257
 
4.0%
auto-sklearn 2257
 
4.0%
xcessiv 2257
 
4.0%
tuning 1463
 
2.6%
ray.tune 1463
 
2.6%
hyperopt 1463
 
2.6%
Other values (24) 22805
40.3%
2024-11-05T00:44:18.610554image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 56883
 
12.0%
46307
 
9.7%
t 40565
 
8.5%
o 32966
 
6.9%
a 29760
 
6.3%
n 27099
 
5.7%
u 21484
 
4.5%
i 17800
 
3.7%
r 16781
 
3.5%
. 16645
 
3.5%
Other values (31) 168857
35.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 475147
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 56883
 
12.0%
46307
 
9.7%
t 40565
 
8.5%
o 32966
 
6.9%
a 29760
 
6.3%
n 27099
 
5.7%
u 21484
 
4.5%
i 17800
 
3.7%
r 16781
 
3.5%
. 16645
 
3.5%
Other values (31) 168857
35.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 475147
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 56883
 
12.0%
46307
 
9.7%
t 40565
 
8.5%
o 32966
 
6.9%
a 29760
 
6.3%
n 27099
 
5.7%
u 21484
 
4.5%
i 17800
 
3.7%
r 16781
 
3.5%
. 16645
 
3.5%
Other values (31) 168857
35.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 475147
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 56883
 
12.0%
46307
 
9.7%
t 40565
 
8.5%
o 32966
 
6.9%
a 29760
 
6.3%
n 27099
 
5.7%
u 21484
 
4.5%
i 17800
 
3.7%
r 16781
 
3.5%
. 16645
 
3.5%
Other values (31) 168857
35.5%
Distinct554
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Memory size159.8 KiB
2024-11-05T00:44:18.830550image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length129
Median length107
Mean length36.541606
Min length4

Characters and Unicode

Total characters373711
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique194 ?
Unique (%)1.9%

Sample

1st rowNone
2nd row Scikit-learn , TensorFlow , Keras , RandomForest
3rd row Scikit-learn , RandomForest, Xgboost , LightGBM
4th row Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret
5th row Scikit-learn , TensorFlow , Keras , PyTorch
ValueCountFrequency (%)
17585
35.9%
scikit-learn 6883
 
14.1%
keras 4265
 
8.7%
tensorflow 4233
 
8.6%
randomforest 3457
 
7.1%
xgboost 3367
 
6.9%
pytorch 2517
 
5.1%
lightgbm 1734
 
3.5%
none 1302
 
2.7%
caret 984
 
2.0%
Other values (4) 2617
 
5.3%
2024-11-05T00:44:19.333549image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
86729
23.2%
o 25933
 
6.9%
r 23427
 
6.3%
e 21408
 
5.7%
, 20328
 
5.4%
a 17843
 
4.8%
t 17434
 
4.7%
i 17029
 
4.6%
s 16047
 
4.3%
n 15875
 
4.2%
Other values (27) 111658
29.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 373711
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
86729
23.2%
o 25933
 
6.9%
r 23427
 
6.3%
e 21408
 
5.7%
, 20328
 
5.4%
a 17843
 
4.8%
t 17434
 
4.7%
i 17029
 
4.6%
s 16047
 
4.3%
n 15875
 
4.2%
Other values (27) 111658
29.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 373711
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
86729
23.2%
o 25933
 
6.9%
r 23427
 
6.3%
e 21408
 
5.7%
, 20328
 
5.4%
a 17843
 
4.8%
t 17434
 
4.7%
i 17029
 
4.6%
s 16047
 
4.3%
n 15875
 
4.2%
Other values (27) 111658
29.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 373711
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
86729
23.2%
o 25933
 
6.9%
r 23427
 
6.3%
e 21408
 
5.7%
, 20328
 
5.4%
a 17843
 
4.8%
t 17434
 
4.7%
i 17029
 
4.6%
s 16047
 
4.3%
n 15875
 
4.2%
Other values (27) 111658
29.9%

Correlations

2024-11-05T00:44:19.526548image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Approximately how many individuals are responsible for data science workloads at your place of business?Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years?Does your current employer incorporate machine learning methods into their business?For how many years have you used machine learning methods?Have you ever used a TPU (tensor processing unit)?How long have you been writing code to analyze data (at work or at school)?Select the title most similar to your current role (or most recent title if retired)What is the highest level of formal education that you have attained or plan to attain within the next 2 years?What is the size of the company where you are employed?What is your age (# years)?What is your current yearly compensation (approximate $USD)?What is your gender?What programming language would you recommend an aspiring data scientist to learn first?Which types of specialized hardware do you use on a regular basis?
Approximately how many individuals are responsible for data science workloads at your place of business?1.0000.1500.2440.1120.0340.1230.1120.0540.3010.0330.1040.0230.0240.036
Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years?0.1501.0000.1670.1570.0730.1440.0870.0450.1020.0960.1960.0480.0370.088
Does your current employer incorporate machine learning methods into their business?0.2440.1671.0000.2020.0510.1430.1620.0620.1210.0500.1370.0320.0410.098
For how many years have you used machine learning methods?0.1120.1570.2021.0000.0860.4630.1540.1470.0360.1600.1650.0650.0500.112
Have you ever used a TPU (tensor processing unit)?0.0340.0730.0510.0861.0000.0500.0450.0180.0230.0430.0460.0420.0350.280
How long have you been writing code to analyze data (at work or at school)?0.1230.1440.1430.4630.0501.0000.1450.1490.0580.2820.2280.0500.0610.077
Select the title most similar to your current role (or most recent title if retired)0.1120.0870.1620.1540.0450.1451.0000.1800.0530.0840.0680.0860.0800.085
What is the highest level of formal education that you have attained or plan to attain within the next 2 years?0.0540.0450.0620.1470.0180.1490.1801.0000.0600.1520.0860.0520.0510.037
What is the size of the company where you are employed?0.3010.1020.1210.0360.0230.0580.0530.0601.0000.0710.1370.0300.0400.040
What is your age (# years)?0.0330.0960.0500.1600.0430.2820.0840.1520.0711.0000.1480.0640.0560.034
What is your current yearly compensation (approximate $USD)?0.1040.1960.1370.1650.0460.2280.0680.0860.1370.1481.0000.0770.0380.044
What is your gender?0.0230.0480.0320.0650.0420.0500.0860.0520.0300.0640.0771.0000.0400.163
What programming language would you recommend an aspiring data scientist to learn first?0.0240.0370.0410.0500.0350.0610.0800.0510.0400.0560.0380.0401.0000.070
Which types of specialized hardware do you use on a regular basis?0.0360.0880.0980.1120.2800.0770.0850.0370.0400.0340.0440.1630.0701.000

Missing values

2024-11-05T00:43:41.286377image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
A simple visualization of nullity by column.
2024-11-05T00:43:43.273605image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

What is your age (# years)?What is your gender?In which country do you currently reside?What is the highest level of formal education that you have attained or plan to attain within the next 2 years?Select the title most similar to your current role (or most recent title if retired)What is the size of the company where you are employed?Approximately how many individuals are responsible for data science workloads at your place of business?Does your current employer incorporate machine learning methods into their business?What is your current yearly compensation (approximate $USD)?Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years?What is the primary tool that you use at work or school to analyze data?How long have you been writing code to analyze data (at work or at school)?What programming language would you recommend an aspiring data scientist to learn first?Have you ever used a TPU (tensor processing unit)?For how many years have you used machine learning methods?Who/what are your favorite media sources that report on data science topics?On which platforms have you begun or completed data science courses?Which of the following integrated development environments (IDE's) do you use on a regular basis?Which of the following hosted notebook products do you use on a regular basis?What programming languages do you use on a regular basis?What data visualization libraries or tools do you use on a regular basis?Which types of specialized hardware do you use on a regular basis?Which of the following ML algorithms do you use on a regular basis?Which categories of ML tools do you use on a regular basis?Which of the following machine learning frameworks do you use on a regular basis?
022-24MaleFranceMaster’s degreeSoftware Engineer1000-9,999 employees0I do not know30,000-39,999$0 (USD)Basic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -11-2 yearsPythonNever1-2 yearsTwitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), UdemyJupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , SpyderNonePython, R, SQL, Java, Javascript, MATLABMatplotlibCPUs, GPUsLinear or Logistic RegressionNoneNone
340-44MaleAustraliaMaster’s degreeOther> 10,000 employees20+I do not know250,000-299,999$10,000-$99,999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -11-2 yearsPythonOnce2-3 yearsPodcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc)Coursera, edX, DataCamp, University Courses (resulting in a university degree)Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio CodeMicrosoft Azure NotebooksPython, R, SQL, BashGgplot / ggplot2 , Matplotlib , SeabornCPUs, GPUsLinear or Logistic Regression, Convolutional Neural NetworksAutomation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)Scikit-learn , TensorFlow , Keras , RandomForest
422-24MaleIndiaBachelor’s degreeOther0-49 employees0No (we do not use ML methods)4,000-4,999$0 (USD)Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1< 1 yearsPythonNever< 1 yearsYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), OtherOtherJupyter (JupyterLab, Jupyter Notebooks, etc)Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc)Python, SQLMatplotlib , Plotly / Plotly Express , SeabornCPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc)NoneScikit-learn , RandomForest, Xgboost , LightGBM
550-54MaleFranceMaster’s degreeData Scientist0-49 employees3-4We have well established ML methods (i.e., models in production for more than 2 years)60,000-69,999$10,000-$99,999Advanced statistical software (SPSS, SAS, etc.), -1, 0, -1, -1, -120+ yearsJavaNever10-15 yearsYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)NoneRStudio , OtherNonePython, RGgplot / ggplot2CPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural NetworksAutomated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret
622-24MaleIndiaMaster’s degreeData Scientist50-249 employees20+We are exploring ML methods (and may one day put a model into production)10,000-14,999$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2, -13-5 yearsPython6-24 times2-3 yearsKaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Journal Publications (traditional publications, preprint journals, etc)Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), UdemyJupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime TextKaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHubPython, R, BashMatplotlib , Plotly / Plotly Express , Bokeh , SeabornCPUs, GPUsLinear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural NetworksAutomated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)Scikit-learn , TensorFlow , Keras , PyTorch
722-24FemaleUnited States of AmericaBachelor’s degreeData Scientist> 10,000 employees20+We recently started using ML methods (i.e., models in production for less than 2 years)80,000-89,999$0 (USD)Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 3, -13-5 yearsPythonOnce3-4 yearsHacker News (https://news.ycombinator.com/), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)Udemy, University Courses (resulting in a university degree)Jupyter (JupyterLab, Jupyter Notebooks, etc) , SpyderMicrosoft Azure Notebooks , AWS Notebook Products (EMR Notebooks, Sagemaker Notebooks, etc)PythonMatplotlib , Plotly / Plotly ExpressCPUsLinear or Logistic Regression, Decision Trees or Random Forests, Convolutional Neural NetworksNoneScikit-learn , TensorFlow , Keras , Spark MLib
955-59MaleNetherlandsMaster’s degreeOther0-49 employees1-2We are exploring ML methods (and may one day put a model into production)$0-999$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 5, -15-10 yearsPythonNever< 1 yearsKaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)CourseraJupyter (JupyterLab, Jupyter Notebooks, etc)NonePython, SQLMatplotlib , D3.js , SeabornCPUsLinear or Logistic Regression, Bayesian Approaches, Generative Adversarial NetworksNoneScikit-learn , PyTorch
1130-34MaleGermanyMaster’s degreeStatistician0-49 employees5-9We recently started using ML methods (i.e., models in production for less than 2 years)2,000-2,999$1000-$9,999Basic statistical software (Microsoft Excel, Google Sheets, etc.), 2, -1, -1, -1, -15-10 yearsR2-5 times4-5 yearsPodcasts (Chai Time Data Science, Linear Digressions, etc)CourseraJupyter (JupyterLab, Jupyter Notebooks, etc)Code OceanRMatplotlibCPUsBayesian ApproachesAutomated data augmentation (e.g. imgaug, albumentations)Scikit-learn
1230-34MaleGermanyBachelor’s degreeData Scientist50-249 employees5-9We recently started using ML methods (i.e., models in production for less than 2 years)70,000-79,999$1000-$9,999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 6, -15-10 yearsRNever4-5 yearsNoneedXJupyter (JupyterLab, Jupyter Notebooks, etc) , RStudioNonePython, RGgplot / ggplot2CPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Dense Neural Networks (MLPs, etc)NoneKeras , Caret
1330-34MaleUnited States of AmericaMaster’s degreeProduct/Project Manager> 10,000 employees20+I do not know90,000-99,999$0 (USD)Basic statistical software (Microsoft Excel, Google Sheets, etc.), 1, -1, -1, -1, -13-5 yearsPythonNever2-3 yearsHacker News (https://news.ycombinator.com/), Reddit (r/machinelearning, r/datascience, etc), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Udacity, Coursera, DataQuest, Kaggle Courses (i.e. Kaggle Learn), Fast.ai, Udemy, University Courses (resulting in a university degree)Jupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharm , Atom , Notepad++ , Sublime TextKaggle Notebooks (Kernels) , Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc) , Code OceanPythonMatplotlib , Plotly / Plotly Express , SeabornNone / I do not knowLinear or Logistic Regression, Decision Trees or Random Forests, Bayesian ApproachesNoneScikit-learn , RandomForest
What is your age (# years)?What is your gender?In which country do you currently reside?What is the highest level of formal education that you have attained or plan to attain within the next 2 years?Select the title most similar to your current role (or most recent title if retired)What is the size of the company where you are employed?Approximately how many individuals are responsible for data science workloads at your place of business?Does your current employer incorporate machine learning methods into their business?What is your current yearly compensation (approximate $USD)?Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years?What is the primary tool that you use at work or school to analyze data?How long have you been writing code to analyze data (at work or at school)?What programming language would you recommend an aspiring data scientist to learn first?Have you ever used a TPU (tensor processing unit)?For how many years have you used machine learning methods?Who/what are your favorite media sources that report on data science topics?On which platforms have you begun or completed data science courses?Which of the following integrated development environments (IDE's) do you use on a regular basis?Which of the following hosted notebook products do you use on a regular basis?What programming languages do you use on a regular basis?What data visualization libraries or tools do you use on a regular basis?Which types of specialized hardware do you use on a regular basis?Which of the following ML algorithms do you use on a regular basis?Which categories of ML tools do you use on a regular basis?Which of the following machine learning frameworks do you use on a regular basis?
1929950-54MaleFranceSome college/university study without earning a bachelor’s degreeData Scientist0-49 employees3-4We use ML methods for generating insights (but do not put working models into production)100,000-124,999$10,000-$99,999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 125, -15-10 yearsPython6-24 times4-5 yearsTwitter (data science influencers), Hacker News (https://news.ycombinator.com/), Reddit (r/machinelearning, r/datascience, etc), Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc)Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), University Courses (resulting in a university degree), OtherJupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , Visual Studio / Visual Studio CodeKaggle Notebooks (Kernels) , Google Colab , Microsoft Azure Notebooks , Binder / JupyterHubPython, SQL, C++Matplotlib , Shiny , Plotly / Plotly ExpressTPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Evolutionary Approaches, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks, Transformer Networks (BERT, gpt-2, etc)Automated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv)Scikit-learn , TensorFlow , PyTorch , Spark MLib
1932425-29MaleNigeriaDoctoral degreeData Scientist250-999 employees1-2We are exploring ML methods (and may one day put a model into production)1,000-1,999$100-$999Business intelligence software (Salesforce, Tableau, Spotfire, etc.), -1, -1, 337, -1, -1< 1 yearsPythonOnce< 1 yearsReddit (r/machinelearning, r/datascience, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Udacity, edX, UdemyVisual Studio / Visual Studio CodeMicrosoft Azure NotebooksPython, R, SQLGgplot / ggplot2GPUsBayesian ApproachesAutomated data augmentation (e.g. imgaug, albumentations), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)Fast.ai
1933818-21MaleNigeriaBachelor’s degreeData Analyst250-999 employees5-9I do not know5,000-7,499$1000-$9,999Advanced statistical software (SPSS, SAS, etc.), -1, 253, -1, -1, -1< 1 yearsRNever< 1 yearsHacker News (https://news.ycombinator.com/), Kaggle (forums, blog, social media, etc)DataCampRStudioNonePython, RGgplot / ggplot2CPUsLinear or Logistic RegressionAutomated feature engineering/selection (e.g. tpot, boruta_py)None
1942535-39MaleSaudi ArabiaMaster’s degreeData Scientist> 10,000 employees5-9We have well established ML methods (i.e., models in production for more than 2 years)100,000-124,999$10,000-$99,999Other, -1, -1, -1, -1, -15-10 yearsRNever10-15 yearsYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Slack Communities (ods.ai, kagglenoobs, etc)Coursera, DataCampRStudioNoneR, SQLGgplot / ggplot2 , Shiny , Plotly / Plotly Express , Leaflet / FoliumCPUsGradient Boosting Machines (xgboost, lightgbm, etc)NoneXgboost , Caret
1944225-29MaleViet NamMaster’s degreeData Analyst50-249 employees1-2We are exploring ML methods (and may one day put a model into production)$0-999$1-$99Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 50, -11-2 yearsPythonNever1-2 yearsKaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Udacity, Coursera, Kaggle Courses (i.e. Kaggle Learn), Fast.ai, Udemy, LinkedIn LearningJupyter (JupyterLab, Jupyter Notebooks, etc) , Sublime TextKaggle Notebooks (Kernels) , Google ColabPythonMatplotlib , SeabornCPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian ApproachesAutomated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune)Scikit-learn , Xgboost
1944325-29MaleIndiaMaster’s degreeData Scientist0-49 employees1-2We recently started using ML methods (i.e., models in production for less than 2 years)1,000-1,999$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2838, -13-5 yearsPythonNever2-3 yearsHacker News (https://news.ycombinator.com/), Kaggle (forums, blog, social media, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Slack Communities (ods.ai, kagglenoobs, etc)Kaggle Courses (i.e. Kaggle Learn), LinkedIn LearningJupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharm , MATLAB , Notepad++Google Cloud Notebook Products (AI Platform, Datalab, etc) , AWS Notebook Products (EMR Notebooks, Sagemaker Notebooks, etc)Python, MATLABMatplotlibCPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Convolutional Neural NetworksAutomated data augmentation (e.g. imgaug, albumentations)Scikit-learn , TensorFlow , PyTorch , Spark MLib
1958222-24FemaleOtherBachelor’s degreeOther50-249 employees1-2We are exploring ML methods (and may one day put a model into production)5,000-7,499$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -11-2 yearsPythonNever1-2 yearsOtherUdacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), University Courses (resulting in a university degree), OtherJupyter (JupyterLab, Jupyter Notebooks, etc) , Atom , Visual Studio / Visual Studio Code , SpyderGoogle ColabPythonMatplotlib , SeabornCPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Dense Neural Networks (MLPs, etc), Convolutional Neural NetworksAutomated hyperparameter tuning (e.g. hyperopt, ray.tune)Scikit-learn , TensorFlow , PyTorch
1966325-29MaleChinaI prefer not to answerData Engineer250-999 employees5-9We recently started using ML methods (i.e., models in production for less than 2 years)20,000-24,999$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 12, -11-2 yearsPythonOnce1-2 yearsOtherKaggle Courses (i.e. Kaggle Learn)Jupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharmGoogle ColabPythonSeabornGPUsDense Neural Networks (MLPs, etc), Recurrent Neural NetworksNoneScikit-learn , TensorFlow , Keras
1969025-29MaleAustraliaBachelor’s degreeOther1000-9,999 employees5-9No (we do not use ML methods)60,000-69,999$10,000-$99,999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 14, -13-5 yearsPythonNever1-2 yearsHacker News (https://news.ycombinator.com/), Reddit (r/machinelearning, r/datascience, etc), Kaggle (forums, blog, social media, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Coursera, edX, Fast.ai, UdemyJupyter (JupyterLab, Jupyter Notebooks, etc) , MATLAB , Visual Studio / Visual Studio CodeNonePython, SQL, MATLABMatplotlib , Plotly / Plotly Express , Bokeh , SeabornCPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian ApproachesNoneScikit-learn , TensorFlow , PyTorch
1971650-54MaleFranceBachelor’s degreeSoftware Engineer> 10,000 employees20+We have well established ML methods (i.e., models in production for more than 2 years)60,000-69,999$0 (USD)Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 25, -13-5 yearsPythonNever4-5 yearsBlogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Coursera, edX, UdemyJupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio CodeIBM Watson StudioPython, SQL, Java, BashMatplotlibCPUsLinear or Logistic Regression, Decision Trees or Random ForestsAutomated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune)Scikit-learn , Spark MLib

Duplicate rows

Most frequently occurring

What is your age (# years)?What is your gender?In which country do you currently reside?What is the highest level of formal education that you have attained or plan to attain within the next 2 years?Select the title most similar to your current role (or most recent title if retired)What is the size of the company where you are employed?Approximately how many individuals are responsible for data science workloads at your place of business?Does your current employer incorporate machine learning methods into their business?What is your current yearly compensation (approximate $USD)?Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years?What is the primary tool that you use at work or school to analyze data?How long have you been writing code to analyze data (at work or at school)?What programming language would you recommend an aspiring data scientist to learn first?Have you ever used a TPU (tensor processing unit)?For how many years have you used machine learning methods?Who/what are your favorite media sources that report on data science topics?On which platforms have you begun or completed data science courses?Which of the following integrated development environments (IDE's) do you use on a regular basis?Which of the following hosted notebook products do you use on a regular basis?What programming languages do you use on a regular basis?What data visualization libraries or tools do you use on a regular basis?Which types of specialized hardware do you use on a regular basis?Which of the following ML algorithms do you use on a regular basis?Which categories of ML tools do you use on a regular basis?Which of the following machine learning frameworks do you use on a regular basis?# duplicates
030-34MaleNetherlandsDoctoral degreeResearch Scientist1000-9,999 employees20+I do not know50,000-59,999$0 (USD)Basic statistical software (Microsoft Excel, Google Sheets, etc.), 1, -1, -1, -1, -15-10 yearsPythonNever< 1 yearsCourse Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Coursera, Fast.aiJupyter (JupyterLab, Jupyter Notebooks, etc) , Atom , MATLABGoogle ColabPython, MATLABMatplotlibCPUs, GPUsConvolutional Neural NetworksNoneFast.ai2